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Computerized adaptive testing is becoming increasingly popular 
due to advancement of modern computer technology. It differs from 
the conventional standardized testing in that the selection of test 
items is tailored to individual examinee's ability level. Arising from 
this selection strategy is a nonlinear sequential design problem. We 
study, in this paper, the sequential design problem in the context of 
the logistic item response theory models. We show that the adaptive 
design obtained by maximizing the item information leads to a con- 
sistent and asymptotically normal ability estimator in the case of the 
Rasch model. Modifications to the maximum information approach 
are proposed for the two- and three-parameter logistic models. Sim- 
ilar asymptotic properties are established for the modified designs 
and the resulting estimator. Examples are also given in the case of 
the two-parameter logistic model to show that without such modifi- 
cations, the maximum likelihood estimator of the ability parameter 
may not be consistent. 

1. Introduction. Computerized adaptive testing (CAT) is becoming in- 
creasingly popular due to advancement of modern computer teclmology. The 
concept of adaptive testing was originally conceived by Lord (1971) in his at- 
tempt to utilize the stochastic approximation algorithm of Robbins and Monro 
(1951) for designing more efficient tests. Major advances were carried out 
and documented in Owen (1975), Weiss (1976) and Wainer (2000). A dis- 
tinctive feature of adaptive testing is to tailor test items (questions) to each 
examinee's ability level, so that able examinees can avoid doing too many 
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easy items and less able examinees can avoid doing too many difficult items. 
Specifically, if the examinee answers a question correctly (incorrectly), then 
the next question administered to him/her will tend to be easier (more dif- 
ficult). Through such an adaptive approach, questions with their difficulty 
levels suitable to a specific examinee are likely to be allocated. In conse- 
quence, examinees are challenged but not discouraged, leading to their abil- 
ity levels being measured more accurately with the same or fewer number 
of items than using the conventional tests. Rapid development of computer 
technology has made adaptive testing a very promising option and, to a 
certain extent, the future of standardized tests. For example, computerized 
adaptive tests have already been implemented in GRE, the Graduate Record 
Examination, and GMAT, the Graduate Management Admission Test. 

Both theoretical and implementational aspects of adaptive testing rely 
heavily on the item response theory (IRT) models, which relate examinees' 
ability levels to their responses to test items. Suppose that an examinee's 
ability level is characterized by a single parameter 9. A basic assumption of 
the IRT is that for a given item the probability of producing a correct answer 
depends only on examinee's ability parameter 9. The resulting probability 
curve, as 9 varies, is known as the item characteristic curve (ICC) of the 
given item. Different parametrizations of the ICC curve lead to different 
IRT models. 

Rasch (1960) proposed using the family of shifted logistic functions, exp{9 — 
6)/(l + exp(6 — 6)), to model the ICC. Here, b determines the position of the 
ICC along the ability scale and is known as the item difficulty parameter. 
Exponent 9 — b may be replaced by 1.7{9 — b) to bring the curve closer to the 
standard normal distribution function. The latter will not be used in this 
paper, however, for mathematical simplicity. Let Y denote an examinee's re- 
sponse, with values 1 indicating a correct answer and an incorrect answer, 
to an item whose ICC follows the Rasch model with difficulty b. Then, 

(1-1) ^(^ = ^I^) = TT^' 

where 9 denotes the ability level of the examinee. 

A more general model, which includes the Rasch model as a special case, 
is the so-called three-parameter logistic (3-PL) model, whose ICC is defined 

by 

(1-2) p(Y=m=c+ii-c) ^f'~l^^ , 

where Y, 9 and b have the same interpretations as those in (1.1) and 
where the additional item parameters c and a measure, respectively, 
the degree of guessing and the discriminating power [see Lord (1980), 
Hambleton and Swaminathan (1985)]. The Rasch model corresponds to the 
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situation in which there is no guessing (c = 0) and all items have the same 
discriminating power (a = 1 when properly scaled). An intermediate model 
is 



which is known as the two-parameter logistic (2-PL) model. 

The conventional IRT model-based design of a test is the advance selection 
of a set of n items whose parameters have been precalibrated (known). For 
each examinee, there are n responses, say Yi, . . . , Y^, to the n items. Point 
and interval estimation of 9 for the examinee can then be obtained by, 
for example, maximizing the likelihood function of 9 with Yi,...,Yn and 
calculating the observed Fisher information, or by other methods that can 
be found in statistical literature. Lord (1980) contains detailed descriptions 
of relevant statistical inference procedures and theory thereof. 

The main focus of the present investigation is on the IRT model-based 
adaptive design of computerized tests. An adaptive test differs from a con- 
ventional test in that the assignment of the test items are performed se- 
quentially, with selection of each item depending on the responses of the 
examinee to the preceding items. More specifically, let A be the item bank 
from which items may be selected and assigned to the examinee. Suppose 
that k — 1 items, ai , . . . , a^-i G A have already been selected and that the re- 
sponses from the examinees are Yi, . . . , Yfc_i. The selection of the kth. item, 
CKfc, will be based on the previous items, ai, . . . ,ak-i as well as the re- 
sponses Yi,...,Yfc_i. Arising from this formulation are three aspects that 
may be studied: (1) Design of an adaptive rule for selection of test items 
ai,a2, ■ ■ ■ , (2) sequential estimation of ability parameter 9 at each stage, 
and (3) properties of the adaptive design and the resulting estimator. 

Lord (1980), Chapter 10, argued that, for a given examinee, the items 
should be selected to maximize the Fisher information. Let Pa{9) be the 
probability that an examinee with ability 9 answers item a correctly. The 
Fisher information function (of 9) for a is simply 



where Qa{9) = 'i — Pa{9)- If ^ were known, then the optimal choice, according 
to Lord, is the one that maximizes Ia{9)- Although in reality we do not 
know 9, the sequential approach allows us to use the current estimate of 
9 in deciding the next choice of a. Our results, to be presented in this 
paper, indicate that, for the Rasch model, such an approach leads to an 
asymptotically optimal design and that, for the two-parameter and three- 
parameter logistic model, the approach does not in general lead to an optimal 
design. In fact, the procedure needs to be modified in order to produce a 



(1.3) 



P{Y = l\9) 



I + ga(9-fe) ' 



(1.4) 
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reasonable design. Note that, throughout the paper, the term optimal is 
referred to that the adaptive design leads to a consistent and asymptotically 
normal ability estimator. 

Despite the increased prominence of CAT in standardized testing, in- 
depth statistical analysis has yet to be developed. The present paper is aimed 
at providing some basic results in certain idealized situations. It is organized 
as follows. The Rasch model is studied in Section 2 in the context of the 
adaptive design and maximum likelihood estimation. It is shown there that 
the maximum Fisher information-based sequential design, in conjunction 
with updating maximum likelihood recursion, is asymptotically optimal, and 
the resulting maximum likelihood estimator is consistent and asymptotically 
normal. In Section 3, a modification to the maximum Fisher information- 
based design for the two-parameter logistic model is proposed, and the re- 
sulting maximum likelihood estimator is shown to be consistent and asymp- 
totically normal. A counterexample is also given to illustrate the necessity of 
the proposed modification. Treatment of the general three-parameter logistic 
model is given in Section 4, where, in addition to modifying the maximum 
Fisher information design, we also propose an approximation to the maxi- 
mum likelihood estimating equation. The usual large sample properties are 
established accordingly. Discussions and some concluding remarks are given 
in Section 5. 

2. Information-based adaptive design for the Rasch model. Recall that, 
under the Rasch model, the probability of answering an item correctly by an 
examinee with ability parameter 9 is exp{9 — b)/[l + exp(^ — b)], where b is 
the item parameter representing the difficulty level. From (1.4), the Fisher 
information of the item can be written as 



For a given examinee, Ibi^) attains its maximum value 1/4 at 6 = 0. There- 
fore, the optimal design is to select items with difficulty parameter b = 9. 
Since 9 is unknown, successive approximations to the optimal design will be 
needed. 

A general recursive algorithm known as the stochastic approximation for 
approximating optimal design points was first proposed by Robbins and Monro 
(1951). Lord (1971) discovered use of the stochastic approximation in devel- 
oping adaptive (tailored) tests. Wu (1985) introduced a maximum likelihood 
recursion as an alternative to the stochastic approximation when the under- 
lying response curve is of the logistic form. He further showed, through ex- 
tensive simulation studies, that his maximum likelihood recursion improves 
efficiency over the stochastic approximation when the sample size is moder- 
ate. 
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In this section, we first consider an idealized setting for CAT in which 
available items at each stage exhaust all difficulty levels. In other words, 
for every b, an item with ICC {exp(^ — b)/[l + exp{9 — b)],6 G R} can be 
administered to the examinee. We will then consider more realistic situations 
for which available items are limited, so that we can at best choose items 
that are closest to the idealized optimal ones. Results for the idealized CAT 
will be developed and then extended for the more realistic situations. 

For the idealized CAT, the sequential design based on maximizing the 
Fisher information and updating maximum likelihood estimators consists of 
the following steps: 

1. Initialization. Specify the difficulty level, say 61, of the initial item. If the 
examinee's response is correct (i.e., Yi = 1), then choose the succeeding 
items with increasing difficulty parameters (bi <)b2 < b^ ■ ■ ■ < b/^.^, where 
ho = inf{j '.Yj = 0} is the first time an incorrect response occurs. On the 
other hand, if the response to the first item is incorrect, then select the 
succeeding items with decreasing difficulty parameters (61 >)62 > 63 • • • > 
bko, where ko = inf{j ■.Yj = l}. 

2. Estimation. For each k>kQ, define 9k by solving the maximum likelihood 
estimating equation 



Since the response sequence {Yi,...,!^} contains both and 1, 9k is 
uniquely and well defined. 
3. Design. After k{> ko) items are administered and 9k is obtained, select 
the next item by setting bk-^i = 9k. Note that this selection is simply the 
idealized optimal design, but with unknown parameter 9 being replaced 
by its most recent estimator. 

The preceding adaptive testing procedure was proposed and discussed in 
Lord (1971, 1980). It was also studied in the context of sequential optimal 
design in Wu (1985), where its connection to Robbins and Monro's stochas- 
tic approximation algorithm was found. Ying and Wu (1997) established an 
asymptotic theory for a class of sequential design problems. The next theo- 
rem shows that the sequential estimator 0„ is consistent and asymptotically 
normal. It entails that the adaptive design is asymptotically efficient. 

Theorem 1. Let {9k} be the sequential estimators specified by steps 
1-3 for the Rasch model. Then, as n —> 00, 9n ^ 9 a.s. and \/n/i{9n — 



9) iV(0, 1). Furthermore, 4:In{9)/n 1 a.s., where In{9) = Er=iexp(6' - 
6j)/(l -|- exp(0 — bi))"^ is the observed Fisher information. 




k 
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The asymptotic variance for On is 4/n, which is exactly the inverse of 
the Fisher information if all the n items are chosen optimally (i.e., hi = 6). 
Thus, the estimator On is asymptotically optimal. However, under the more 
realistic situation in which the item bank has limited capacity, that is 
can only be chosen from a set of discrete values, then the consistency and 
asymptotic normality for Ok still hold, but the asymptotic variance needs to 
be replaced by the inverse of the Fisher information. 

Theorem 1 is implied by the more general result given by Theorem 2. It 
can also be inferred from Ying and Wu (1997), Theorem 1. Proof of Theorem 
2 uses the so-called local convergence theorem for martingale sequences. 

3. The two-parameter logistic model. Recall that the two-parameter lo- 
gistic model is an extension of the Rasch model to include a second item 
parameter a, which represents the discriminating power of the item. Under 
this model, an examinee with ability answers an item, specified by a and 
6, correctly with probability e"(^"''V[l + e"^^"^)] in (1.3). 

The Fisher information function for an item specified by a and h may be 
expressed as 

a(e-b) 

(3-1) ma,h) = a',^^^-^^. 

If a and b are unrestricted, then the information-based optimal design prob- 
lem is singular because 

(3.2) max/(^|a, h) ( = maxmax/(0|a, b) ) = max — = oo. 

a,b \ a h / a 4 

From (3.2), the optimal design appears to be 6 = ^ and a = oo. But this will 
be extremely unstable since, for any 0, 

lim l(0\a,b) = 0. 

a — >oo 

One way to avoid such singularity is to restrict the item pool so that pa- 
rameter a will fall into a compact interval in (0, cxo). 

Analogous to the adaptive design for the Rasch model, we introduce a 
similar design for the two-parameter logistic model. However, to avoid the 
singularity, we shall put a restriction on the discrimination parameter a. 
Specifically, let < m < Af < co be fixed in advance, and assume a G [m, M] . 

1. Initialization. Select the initial coin (item) with parameters oi and bi. 
Reasonable choice for them can be made from the prior information about 
the population. If the outcome of the first toss Yi is 1 (head), then choose 
the next ko coins with increasing difficulty parameters {bi <)b2 < ■ ■ ■ < 
bkQ , where ko = inf {j : 1^- = 0} is again the first time a tail occurs. If the 
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first toss is a tail, then choose (61 >)bi > ■ ■ ■ > bk^ with ko being the first 
head. The a-parameters must satisfy m < aj < M,j = 1, . . . , fco but can 
be arbitrary otherwise. 
2. Estimation. For each k>ko, define 9k, the maximum hkehhood estimator, 



as the unique solution to 



Note that the left-hand side of (3.3) is a strictly decreasing function with 
values ranging from J2i=i — Yi)<0 to J2i=i o-i^i > 0- 
3. Design. After 9^ is defined, set 6^+1 = 9k and at+i to be a number in 
[m, M] . The choice for ak+i can depend on data collected up to the cur- 
rent stage. The next selection will be the coin (item) with parameters 
a-k+i and hk+i- 

The preceding sequential design is not optimal, not even asymptotically. It 
is based on a suboptimal design that maximizes the Fisher information over 
b with a being fixed. Such an approach is intuitively sensible, because the 
adaptive test is to match the difficulty level of test items with examinee's 
ability and parameter b represents the item difficulty. Obviously, it does 
not touch upon selection of the discrimination parameter a, which involves 
more complex issues [see Chang and Ying (1999) and Chang, Qian and Ying 
(2001)]. 

Theorem 2. Under the preceding sequential design for the two-parameter 
logistic model, 9n^9 a.s. as n—> 00. In addition, suppose the choice of aj 
satisfies J2i'=i'^i /^n 1; as n ^ 00, for some nonrandom sequence Vn- 
Then, 

n 

(3.4) Y,aK9n-9)^cN{0,l). 

\ i=i 

The normalizing factor \Jy11^=i (^-4) fno^V be replaced by \J I^^\9n) or 

y^l{")(6'), where 

(3.5) /W(0) = ^a? 



I- 



2 



is the observed Fisher information. 

Remark 1. As we stated earlier, the solution to the optimal design 
problem of maximizing the Fisher information is singular, in that the dis- 
crimination parameter will reach 00. The remedial measure taken here is to 
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restrict this parameter to a compact interval. Next we construct an exam- 
ple to show that if the aj are not bounded, it is possible that the resulting 
estimator On may not even be consistent. 

Example 1. Suppose we follow the same sequential design as described 
at the beginning of this section, but with = instead of confining the Ofc 
to a compact interval. Suppose, in addition, that the initial value is taken to 
be ^0 < ^ — 1 — 7r^/6. If Yi = • • • = 1^- = 0, then the subsequent ^fc, 1 < A; < j, 
will be chosen in decreasing order, so that 6i = 6q — eq, . . . ,9j = 9j^i — eq, 
where eo > is a prespecified constant. Let uq be a large integer, so that 
the following conditions are satisfied: 



(3.6) E 



1 



1 + ^ 3' 

k—riQ+l 

(37) Jno + l)^ I 

no 

(3.8) 3(no + lf<Yl k^- 

k=l 

Define event A = {Y^ = 0,k < hq and ^fc = 1, A; > no + 1}. We prove below 
that P{A) > and lim„^oo On < 9 — 1 on A. Therefore, with such a design, 
6n cannot be a consistent estimator of 6. Intuitively, this can occur because 
movement of successive 9j is tied to the a-parameter. A large value of a 
corresponds to a small movement size. The constructed example makes the 
a-parameters so large that the 6j can never move back, even if all the steps 
after no are in right direction. Figure 1 shows graphically two sequences of 
6j, one converges to the 9 and the other does not. 

Remark 2. The constraint that the discrimination parameters Oj are 
bounded away from is also needed. To see this, suppose we set aj = j. 
Then, the total Fisher information for a test of length n is bounded by 

In view of this, it is straightforward that the resulting maximum likelihood 
estimator 9n will not converge to 9. 

Remark 3. It was pointed out by the Associate Editor that an item 
with a large value of a-parameter could be very uninformative if knowledge 
about 9 is poor, and also that a natural way to increase efficiency is to 
use items with small a-parameter values in early stages and use items with 
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10 20 30 40 

Number of Items 

Fig. 1. Examples of convergence and nonconvergence. 

large a-parameter values in later stages. Indeed, such an approach could lead 
to, among other things, substantial efficiency improvement. Figure 2 gives 
efficiency comparison in terms of mean squared errors using ascending and 
descending a-parameter values. For more details and other related issues 
in practical settings, we refer to Chang and Ying (1996, 1999). It is worth 
noting that, if items with a = oo were available, then one could design a 
scheme that approaches the true 9 exponentially fast, though such a scheme 
is likely to be different from the maximum likelihood estimation. 

Remark 4. As pointed by one of the reviewers, setting a < M is reason- 
able, because no item-writer has ever been able to write a sequence of items 
with a-parameters tending to infinity. Also, any reasonable item bank would 
only include items with a-parameters bounded away from 0. If we assume the 
item bank contains all pairs (a, 6) in [m,M] by (— oo,cx)), Chang and Ying 
(1999) proposed the a-stratified method with an objective to limit the ex- 
posure on any given item by using that item at the most advantageous 
point in testing. The a-stratified method attempts to use less discriminating 
items early in the test, when estimation is least precise, and save highly 
discriminating items until later stages, when finer gradations of estimation 
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Fig. 2. Mean squared errors under ascending and descending a-parameter designs. 



are required. One of the advantages of using the a-stratified method is that 
it attempts to equahze the item exposure rates for aU the items in the pool. 



Proof of Theorem 2. The main hue of the proof consists of the 
following four steps. First, we show that the observed Fisher information 
goes to infinity as n — > oo. The second step is to show that the design leads to 
bounded maximum likelihood estimators On- From the boundedness follows 
the consistency. The last step is to show the asymptotic normality. 

Throughout the proof, we shall let = e7(l + e*) and G'(t) = 1 - G{t). 
Define cj-filtration J^k = ^ji Oj+i) j < A; > 0. Then, conditioning on 
J-k-i,Yk is a Bernoulli random variable with success probability G{ak{9 — 
Thus, {Yfc — G{ak{9 — 6^))} is a martingale difference sequence with 
respect to {J^k}- Since G J^k-i is predictable, ak\Yk — G{ak{0 — 6^))] are 
again martingale differences with conditional variances 



Yai{ak[Yk - G{ak{9 - Jfc-i} = alG{ak{9 - bk))G{ak{e - bk)). 
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Applying the martingale local convergence theorem of Chow (1965), Corol- 
lary 5, we have that 



ak[Yk - G{ak{6 -hk))] 



converges a.s. 



We first prove that 
(3.10) 



P\Y,alG{ak{e-hk))G{ak{ 

\k=l 



bk)) < oo 



0. 



Let Ai be the event that Y.k'=i<^kG{ak{0 - bk))[l - G{ak{0 - bk))] < oo. 
Clearly, on Ai, lim„^oo \bn\ = oo or, equivalently, lim„^oo \Gn\ = oo, recalling 
that = 9n as the design requires it. Prom (3.9) and the monotonicity of 
the denominator sequence in (3.9), we have that X^fcLi flfc[^fe — G{ak{0 — bk))] 
converges on Ai. But J2k=i o-klYk — G{ak{0n — bk))] = for all n. So, on Ai, 



oo > 



k=l 



(3.11) 



J2ak[Yk-G{ak{e-bk))] 

n 

ak[Yk - G{ak{e - bk))] - J2 ^k[Yk - G(afc(^„ - bk))] 



lim 

n— >oo 



: lim 



k=l 



k=l 



J2 ak[G{ak{en - bk)) - G{ak{e - bk))] 

k=l 



From (3.11) we claim that lim sup ^„ < cxo on Ai. We prove this claim by 
contradiction. Suppose it is not true. Then there exists a subsequence nj 
such that Orij — > oo and > Ok for all k <nj. This implies the following: 

(3.12) ak[G{ak{en^-bk))-G{ak{e-bk))]>0 for all k<nf, 

for any constant K, i^{k < Hj -.akiO — bk) < —K} oo 



(3.13) 



oo. 



Combining (3.12) with (3.13), we know that (3.11) cannot be true. Thus, 
lim sup ^„ < cxD on Ai. Likewise, we can show that liminf^„ > — oo on Ai. 
These two contradict a previous conclusion that limsup,„_>oo \ 6n\ = oo on 
unless Ai is a null event. Hence (3.10) holds. 

From (3.9), (3.10) and the Kronecker lemma [Chow and Teicher (1988), 
page 114], it follows that 

j:k=iak[Yk-G{ak{e-bk))] 



(3.14) 



Ek=iakGiak{0-bk))G{ak{e-bk)) 







a.s. 
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Substituting the likelihood equation into (3.14), we get 

ELi ak[G{ak{en - h)) - G{ak{e - hu))] 
Yl=iolG{ak{e-hk))G{ak{9-hk)) 
which certainly implies that 
1 " 

(3.16) -Y.^k[G{ak{en-hk))-G{ak{e-hk))]^^ a.s. 

Next, we show that limsup |^ri| < oo a.s. Suppose that this is not true and 
that, without loss of generality, there exists a subsequence {uj} such that 
T oo and > Ok-i = for all k <nj. Let > be a fixed constant. 
Since m < af. < M , we have 

(3.17) aki9-bk)<-6o if6fc>0 + ^. 

m 

But G{ak{9nj - bk)) > G(0) = i for all k < rij, which, together with (3.17), 
implies that 

Giak{9n, - bk)) - G{ak{9 - b^)) > ^ - G{-6o) > 

(3.18) 

^^h>9 + t 

Thus, \imsupj^^i^{k < rij-.bk > 9 + ^}/nj = 0, since we can otherwise 
select a subsequence of rij such that (3.16) does not hold. On the other 
hand, 

G{ak{9n-bk))-G{ak(,9-bk)) 



(3.19) 



[1 -)- eak{0n-bk)j^l _|_ gafe(9-6fc)j 

gafe(6»-bfc)^gafe(9„-6») _ 
[1 _|_ ea/=(9n-fefe)][l _)_ e«fe(^-*fe)] 

gafc(6'-fefe)(l _|_ Q-a.k(8n-e)^ 



Jg-afe(e„-6») _|_ gafe(e-fefe)jj]^ _|_ gafc{e-bfc)j 

Since rij — > oo, we have, in view of (3.19), 
G(afc(l , - bk)) - Giak{9 - bk)) = (1 + o{l))- 



which has the same order as 

pCLkid — bk) 

G{ak{9-bk))G{ak{9-bk)) 



[1 +gafc(6»-fefc)j2 
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for all bk<9 + So/m. But, we know that #{A; < rij '.bk < + 5o/m}/nj 
1. So, (3.15) cannot hold along n = rij. This contradiction proves that 
limsup|0„| < oo a.s. 

Now, by the mean value theorem, there exists ^* between 9 and On such 
that 

n 

J2 ak[G{aki9n - bf,)) - G{aki0 - bk))] 
k=i 

n 

= 5] alGiakie: - bk))G{ak{ei - b^Wn - 6). 

k=l 

Furthermore, liminf X]fc=i a|G(afc(^* - bk))G{ak{9* - b^)) > since 
limsupl^nl < oo. Hence, (3.15) implies that a.s. 

To prove the asymptotic normality, we follow the standard approach by 
taking the Taylor expansion; that is. 



= Y,(^k[Yk-G{ak{en-bk))] 

k=l 

n 

(3.20) =Y,ak[Yk-G{ak{e-bk))] 
1 

n 

J2 alG{akK - bk))G{ak{ei - bk)){k - 0), 



k=l 



k=l 



where 0* is between On and 9 and therefore converges to a.s. From (3.20), 
we have 



1 

J2alG{a,{0:-bk))G{akiK-bk))' 

.k=l 



J2ak[Yk-G{ak{0-bk))]. 

k=l 



Since On^ a.s. and bn^ a.s., it follows that (3.4) holds if we can show 
that 

Cn \ -1/2 n 
^4 "fct^fc - GiMO - bk))] iV(0, 1). 

k=l ) k=\ 

By the assumption, there is a nonrandom sequence ^ oo such that Y^=\ <^\/ 
Vn -^p 1- Thus, we can apply the martingale central limit theorem, as stated 
in Pollard [(1984), page 171] to get (3.21). Because 0^9 a.s., we can easily 
see that X/fc=i ^'k asymptotically equivalent to (^n) Sts well as /W(0). 
□ 
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4. The three-parameter logistic model and a modification to the maxi- 
mum likelihood recursions. The three-parameter logistic model, as speci- 
fied by (1.2), extends the two-parameter model by including an additional 
parameter known as the guessing parameter. Recall that the ICC, in this 
case, is c+ (1 - c)e"(^^''V[l + e''^^"'')]. It is not difficult to see that, when 
c > 0, the family of probability distributions indexed by 9 no longer forms an 
exponential family. Therefore, we expect that there will be extra technical 
difficulties to deal with. 

For an item with parameters a, b and c, the associated Fisher information 
function may be calculated using (1.4) to be 



(4.1) 



I{9\a,b,c) 



(1 



cja e 



2 2a{e-b) 



[c + e'^(e-'')][l + e«(''-^)]2' 
For fixed a and c, the Fisher information reaches its maximum when 



(4.2) 



b = 9 



1 1 + vrT83 

-log 

a 2 



[see Lord (1980), page 152]. As the two examples indicated in Section 3, the 
discrimination parameter cannot be chosen arbitrarily because otherwise it 
may lead to inconsistency. It is also reasonable to put restrictions on select- 
ing c, the guessing parameter. This is because, in view of (4.1), the Fisher 
information reaches the maximum if and only if c = 0. So if no constraint is 
put, then only those items with no guessing will be used. The design prob- 
lem we shall be considering will only involve choice of b, with a and c being 
confined to certain reasonable regions. 

In view of (4.2), we can select the optimal 6 if is specified. The adaptive 
optimal design is then to replace 9 by its current estimator. As we shall see, 
it turns out that the maximum likelihood estimating equation may have 
multiple roots. To avoid such a situation, we shall propose a modification, 
which is asymptotically equivalent to the likelihood estimating equation and 
has a unique root. 

Suppose that the examinee has answered n items, which are specified by 
{ak,bk,Ck),k = 1, . . . , n, and the results are Yi, . . . ,Yn. Then, the maximum 
likelihood estimating equation for 9 may be written as 



(4.3) 



E 

k=l 



ake 



_|_ gafe(6'-fefe) 



Cfc 



Yk-Ck-il- Ck) 



1 _)_ gafe(9-fefe) 



0. 



Unlike in the two-parameter logistic model, the left-hand side of (4.3) is not 
a monotone function of 9. In fact, (4.3) may have multiple roots [Samejima 
(1973)]. On the other hand, when the choice of the difficulty parameter 
satisfies (4.2) {9 will be replaced by the current estimator), it is easy to see 
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that weights in (4.3) 



Therefore, an approximation to (4.3) is 
^1 2cfc + 1 + TTTs^ 



gafe(6»-fefc) 



0, 



which win be called approximate maximum likelihood estimating equation. 
It is obvious that the left-hand side of (4.4) is monotone decreasing in 9. 
Therefore, the solution to (4.4), if it exists, will be unique. Notice also that 
the weights in (4.4) do not depend on the b^- 

An extension of the adaptive design procedure proposed in the preceding 
section to the three-parameter model is described below: 

1. Initialization. In the same way as that for the two-parameter logistic 
model, choose the initial ko items so that {Yi,i < ko} contains both 
and 1. 

2. Selection of 0^. For each k> ko, if 



(4.5) V .^,.^y.-r^ y. 



■■c, 



then define Ok as the unique solution to (4.4). Otherwise, set 6^ = r^, 
where | — 00 is a predetermined sequence. 
3. Design. After selecting 9k, set bk+i = 9k. Also, set ak+i and Ck+i such 
that Ok+i G [m, M], and Ck+i < 1 — (^Oi where (5o > is some constant. 

Remark 5. If Cj = 0, then (4.5) is always satisfied, since there is at 
least one i such that Yi = l. In fact, it is easily seen that (4.5) is a necessary 
and sufficient condition for the modified maximum likelihood estimating 
equation (4.4) to have a solution. 

Remark 6. The use of upper and lower bounds M and m for the is 
explained in the preceding section. The requirement that the Ck be bounded 
above hy 1 — is natural as the guessing parameter would never exceed 0.5. 
However, as indicated by one reviewer, there should be other constraints in 
real applications (e.g., we can not allow the algorithm to only select items 
with a = M and c = 0). 



Theorem 2 can now be extended to cover the sequential design as just 
described for the three-parameter logistic model. 
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Theorem 3. For the sequential design defined in this section, the mod- 
ified maximum likelihood estimating equation (4-4) has, with probability 1, 
a unique solution for all large n. The solution is strongly consistent (i.e. 
On -^6 a.s.). Furthermore, provided that 

for some nonrandom sequence Vn, 

(4.6) V^.{en-o)^cN{o,i). 

The normalizing constant f„, in (4-6) may be replaced by the estimated Fisher 
information 

(47) Iie)-i- (l-c.)ai[e"^(^"-^^-f 

Proof. As in the proof of Theorem 2, define G{t) = e*/{l + e*), G{t) = 
1 — G{t) and = <7{Yj', aj+i, Cj+i; j < k}. Applying the martingale local 
convergence theorem, we have that, analogous to (3.24), 

WkjYk - Cfc - (1 - Ck)G{ak{0 - hk))] 

h T.Uw^j[^, + (1 - c,)G{a,{e - 6,))](1 - c,)G{aj{e - b,)) 

(4.8) 

converges a.s., 

where = afc(l + ^/\ + 8ck)/{2ck + 1 + y/l + 8cfc). A slight modification of 
the proof leading to (3.10) can be constructed to show that 

(4.9) p|^f;u;i[cfc + (l-Cfc)G(afc(e-6fc))](l-Cfc)G(afc(^-6fc))<oo^ =0. 

To provide a sketch to the proof of (4.9), let Ai denote the event inside 
the probability sign in (4.9). Then, on linil^^^ I = oo. We next prove, by 
contradiction, that limsup^„ = oo is impossible. Suppose that limsup^„ = 
cxD. Then, there is a subsequence nj such that 9^ < Onj, k<nj. 
By the definition of On , for n > /cq , 

n 

(4.10) J2 "^klYk - Cfc - (1 - Ck)G{aki9n - bk))] < 0, 

k=l 

with the equality holding if and only if J2k=i^kYk > J2k=i'^kCk- From 
(4.10), we have 

n 

- Ck)[G{ak{en - bk)) - G{ak{e - bk))] 

k=l 
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> Yl '^klYk - Cfe - (1 - Ck)G{ak{9 - 

which converges to a finite hmit on Ai. However, we can easily see that (3.12) 
and (3.13) still hold here. But they imply that the left-hand side of (4.11) 
can be arbitrarily small, which is a contradiction. Thus, limsup0„ < oo on 
Ai. Similarly, liminf0,j > — oo on Ai. Hence, Ai must be a null set, and 
(4.9) holds. 

Now, by the Kronecker lemma, we get from (4.8) and (4.9) that 

ELi Wk[Yk - Cfc - (1 - Ck)G{ak{e - bk))] 
ELi wl\ck + (1 - Ck)G{ak{e - hkMl - Ck)G{ak{e - bk)) ^ 

(4.12) 

a.s. 

Furthermore, by the definition of 6^., for n large enough such that r„, < 0, 

n n 

WkYk < Y ^k[ck + (1 - Ck)G{ak{On - bk))], 

k=l k=l 

which is < Efc=i Wk[ck + (1 - Ck)Giakie - bk))] if 4 = r„. Therefore, (4.12) 
implies that 

ELi - Ck)[Giak{en - bk)) - Gjakje - bk))] 
ELi wllck + (1 - Ck)G{ak{e - bk))]{l - Ck)G{ak{e - bk)) ^ 

(4.13) 

a.s., 

which is analogous to (3.15). 

By examining the derivation following (3.15), we see that the same argu- 
ment can be used to show that lim|0„| < oo a.s. In particular, this implies 
that, for all large n. On is the solution to (4.4). It also implies, together with 
(4.13), that On^O a.s. 

Finally, we can apply the Taylor series expansion to (4.4) to obtain asymp- 
totic normality. The argument is exactly the same as that in the proof of 
Theorem 2. □ 



5. Discussion. CAT has become a popular mode of educational assess- 
ment in the United States. Examples of large-scale applications include the 
Graduate Record Examination (GRE), the Graduate Management Admis- 
sion Test (GMAT), the National Council of State Boards of Nursing and 
the Armed Services Vocational Aptitude Battery (ASVAB). The most im- 
portant component in a CAT is the item selection procedure that is used 
to select items during the course of the test. To date the most commonly 



18 



H.-H. CHANG AND Z. YING 



used item selection procedure is the maximum Fisher information method. 
The motivation for maximizing the Fisher information is to make the trait 
estimator the most efficient. This can be achieved by recursively estimating 
6 with current available data and assigning further items adaptively. How- 
ever, it is necessary to establish the corresponding theoretical properties for 
the maximum information approach. 

The main objective of this paper is to tackle the sequential design and re- 
lated convergence problems arising from the inherent mechanism of adaptive 
testing. It is clear that the logistic item response theory models are natural 
choices for CAT. We showed that, for the Rasch model, the usual plug- in 
adaptive design anchored in the current maximum likelihood estimator of 
the ability parameter converges to the optimal limit, and is therefore asymp- 
totically efficient; moreover, the rate of the convergence can be characterized 
by the asymptotic normality of the maximum likelihood estimator. For the 
two-parameter logistic model, a similar asymptotic theory was developed 
based on an additional parameter modeling assumption that the discrim- 
ination power is restricted to a compact interval. Examples were given to 
illustrate that such restriction is necessary. As to the three-parameter logis- 
tic model, since the maximum likelihood estimating function is not generally 
a monotone function of the ability parameter, the maximum likelihood es- 
timator may not be unique and, therefore, establishing convergence for the 
three-parameter logistic model is more complicated. Recognizing this po- 
tential problem, we proposed an asymptotically equivalent estimating func- 
tion that is monotone in the ability parameter. Consistency and asymptotic 
normality were then proved for the adaptive design based on the modified 
maximum likelihood estimator. 

The large scale implementation of CAT has created many interesting sta- 
tistical issues in design, modeling and analysis. Our theory is established 
for the idealized setup that assumes existence of an infinite item pool. Even 
though, in reality, only finitely many items are available, the theory can 
still serve as a useful guidance to CAT practitioners as to how to choose an 
item selection strategy and how to design a simulation validation as well. 
In practice, simulation studies are always needed to help practitioners to 
evaluate the performance of their adaptive designs. According to the diver- 
gence examples created for the two-parameter logistic model, items with low 
discrimination should be used at the beginning of the test while items with 
high discrimination should be used at later stages. Therefore, a significant 
aspect of the new developments presented in this paper is to provide theoret- 
ical support to the item selection strategy of the a-stratified item selection 
method [Chang and Ying (1999)]. 

In order to design a good CAT algorithm, many complex controls are 
needed such as item exposure control and content balance. The item expo- 
sure rate for each item is defined as the ratio of the number of times the item 
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is administered to the total number of examinees. Since CAT is designed to 
select the best items for each examinee, certain types of items tend to be 
most often selected by the computers, and many items are not selected at all, 
thereby making item exposure rates quite uneven. In addition, various non- 
statistical constraints need to be considered during item selection. Today's 
large-scale application of computer-based achievement tests and licensure 
exams has generated great challenges to test development. Maintaining con- 
tent representation and other constraints is central to test defensibility and 
validity. Examples of the nonstatistical constraints include: a certain pro- 
portion of items should be selected from each content area, correct answers 
should fall approximately equally on options A, B, C and D, and a limited 
number of special items are allowed on a test, such as items with negative 
stems (e.g, "Which of the following choices is NOT true?"), just to name a 
few. 

The a-stratified method was proposed with the objective of limiting the 
exposure of any given item by using that item at the most advantageous 
point in testing. The a-stratified method attempts to control item expo- 
sure by using less discriminating items early in the test, when estimation is 
least precise, and saving highly discriminating items until later stages, when 
finer gradations of estimation are required. One of the advantages of the 
o-stratified method is that it attempts to equalize the item exposure rates 
for all the items in the pool. Recently, methods of controlling content bal- 
ance for the a-stratified method were proposed [see, e.g., van der Linden and 
Chang (2003), Yi and Chang (2003) and Cheng, Chang and Yi (2007)]. The 
advantages for using these methods are twofold: First, they allow the im- 
plementation of constraint on item selection in a-stratified adaptive testing; 
second, the constrained a-stratified methods may result in a set of theoreti- 
cal advantages. It is evident that, by enforcing certain reasonable regularity 
conditions, the consistency results presented in this paper can be generalized 
to the constrained a-stratified methods, along with other reasonable item se- 
lection methods, such as the Bayesian item-selection criteria [see, e.g., van 
der van der Linden (1998)] and several Kullback-Leibler information based 
methods [see, e.g., Chang and Ying (1996)]. 

Similar procedures can be developed, and their properties can be estab- 
lished for other parametric item response theory models. A particularly use- 
ful class is the normal ogive models, in which the logistic link function is 
replaced by the normal distribution function. A minor technical complica- 
tion in dealing with the normal ogive is that, even in the one-parameter case, 
the maximum likelihood estimating function is not monotone and may have 
multiple roots. But this complication may be dealt with by slightly mod- 
ifying the estimating function, as we did for the three-parameter logistic 
model. The example presented in Remark 1 following Theorem 2 appears to 
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be somewhat paradoxical in that the design is intended to increase efficiency 
by making the discrimination parameter large. 

The example presented in Remark 1 following Theorem 2 also appears 
to be somewhat paradoxical, in that the design is intended to increase effi- 
ciency by making the discrimination parameter large. But a closer look at 
the design reveals that the inconsistency of On should be expected. This is 
because the amount of information at 9, the true ability parameter, for the 
fcth item may be extremely small when is not close enough to 9 and is 
large. More specifically, when the magnitude of ak{9 — b^) is large, the Fisher 
information for the item is exponentially small, with the exponent propor- 
tional to —\ak{9 — bk)\- Since under the normal circumstances, bk = 9k-i is 
about 0(A;~^/^) away from 6 [Chang and Stout (1993)], the choice Ofc = k'^ 
effectively makes \ak{9 — bk)\ very large. However, we still do not know if by 
choosing = o[\/k) it will be sufficient to guarantee the consistency of 9^- 

Finally, it should be pointed out that Mislevy and Wu (1996) and 
Mislevy and Chang (2000) showed that item selection in CAT leads to a 
design with missing data that are missing at random (MAR). Therefore, 
most of the standard theory for MLE holds from a missing data point of 
view. 



APPENDIX 



Proof of inconsistency for Example 1. On event A, we know 
that 9k,k <nQ, are initialized so that 9i = Oq — eq, . . . , = 9no-i ~ and 
9k,k>nQ + l, satisfy the maximum likelihood equations. We first claim that, 
on A, 

(A.l) 9no+i <9o= max 4- 

0<A.-<no 

Recall that bk = 9k-i- So, (3.3) entails 

(A 2) (^0 + 1)^' ^ 1^ Pexp[fc3(g„^+, - ^ 

1 + exp[(no + l)3(^„Q+i - ^^-^ 1 + exp[A;3(^„Q+i - 9k-i)] 

Suppose that (A.l) does not hold. Then, ^no+i ^ ^fci ^ !^ "-Oi implying 
left-hand side of (A.2) < \{nQ + if 

and 

right-hand side of (A.2) >-^k^. 

^ k=i 



These two inequalities contradict (3.8). Thus, (A.l) holds. 
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Applying (3.3) to 0no+2, we get 

(no + 2)3 (no + 1)^ 



+ 



l + exp[(no + 2)3(0„,o+2-^no+i)] 1 + exp[(no + l)3(0„o+2 - ^no)] 
(A-3) 



k-- 



1 + exp[/c3(6(„g+2 - 6'fc-i)] 



Note that, on A, since Yfe = l,A;>no + l, On^+k is increasing in /c. Prom 
(A. 3), we claim either ^ ^no + or 

(A.4) ^"0+2-^no+l< (^^|2)2- 

To prove this claim, suppose 0no+2 > ^no + £o = ^no-i- Then, 

l + exp[(no + l)3(^„„+2-^no)] " l + exp[(no + 1)3^0] 6' 
where the last inequality comes from (3.7). But 9no+2 > ^no-ij implying that 

1^ eMk^0no+2-Ok-i)] > ngexp[n|](4o+2 -^no^i)] ^ 1 
^^,=1 1 + exp[fc3((9„Q+2 - ~ 1 + exp[n|]((9„o+2 - ^o-i)] ^ 

Combining (A. 5), (A. 6) with (A. 3), we have 

(no + 2)3 ^ 1 

l + exp[(no + 2)3(^„o+2-4o+i)] 3' 
which, in conjunction with (3.6), entails (A.4). 

Likewise, for 9no+3, we claim one of the following must be true: 

(A. 7) ^no+3 < + £0 = 

(A. 8) Gno+3 - < 

(A. 9) - ^no+2 < 



(no + 2)2' 

1 



(no + 3)2- 

To show this, suppose all of them are not true. Then the likelihood equation 

(no + 3)3 

1 + exp[(no + 3)3(4(,+3 - 9no+2)] 
(no + 2)3 



(A.IO) + 



1 + exp[(no + 2)3(6'„,j+3 - Ono+i)] 



(no + 1)3 A;3exp[A:3(4,+3-4_i)] 



1 + exp[(no + l)3(0no+3 - Ono)] t^i 1 + eM^'^i^n.+z - Ok~l)] 
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cannot hold since, in view of (3.6) and (3.7), 
left-hand side of (A. 10) 

(A.U) 

^ (no + 3)3 ^ (no + 2)3 ^1^1^ 
~ 1 + exp(no + 3) 1 + exp(no + 2) 6 2 

Furthermore, 

(A.12) right-hand side of (A.IO) > ^gexp[n3(g„,^3 - ^no-i)] ^ 1 

1 + exp[n[](6l„„+3 - 6'„„„i)] 2 

From (A. 11) and (A.12), we obtain the desired contradiction and therefore 
the claim that one of (A.7)-(A.9) must hold is true. 

In view of the preceding derivations, we have, for k = 1,2,3, 

no+k 

(A. 13) 0no+k<Oo+ -■ 

We now apply the mathematical induction to show that (A. 13) holds for 
every k. Suppose it is true for k < j. We claim that one of the following 
must hold: 

(A.U) 

(A. 15) ^no+j+l - Gno+k < ^ - 

This can be proved by showing that if none of the above inequalities holds, 
then the likelihood equation for 6no+j+i implies 

no+j+l 



St 



which is a contradiction to (3.6). Clearly (A. 14) or (A. 15) and the induction 
assumption imply that (A. 13) holds with k = j + 1. Hence, (A. 13) holds for 
every k on event A. Thus, on A, 

no+k 

limsup^„<^o+ X! ~^ 

j=n.o+l 

(A.16) <e_i_ + - 

j=no+l 

<e-i. □ 
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